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Abstract 



We give a randomized 0(n poly log n)-time approximation scheme for the Steiner forest problem in the Euclidean 
plane. For every fixed e > and given n terminals in the plane with connection requests between some pairs of 
terminals, our scheme finds a (1 + e)-approximation to the minimum-length forest that connects every requested pair 
of terminals. 

1 Introduction 

1.1 Result and background 

In the Steiner forest problem, we are given a set of n pairs of terminals {(ti, t' i )}2 =1 . The goal is to find a minimum- 
cost forest F such that every pair of terminals is connected by a path in F. We consider the problem where the 
terminals are points in the Euclidean plane. The solution is a set of line segments of the plane; non-terminal points 
with more than two line segments adjacent to them in the solution are called Steiner points. The cost of F is the sum 
of the lengths in £2 of the line segments comprising it. Our main result is: 

Theorem 1.1. There is a randomized 0(n polylog n)-time approximation scheme for the Steiner forest problem in the 
Euclidean plane. 

An approximation scheme is guaranteed, for a fixed e, to find a solution whose total length is an most 1 + e times 
the length of a minimum solution. 

There is a vast literature on algorithms for problems in the Euclidean plane. This work builds on the approximation 
scheme for geometric problems, such as Traveling Salesman and Steiner tree, due to Arora Q. (See 11201 for a digest.) 
Similar techniques were suggested by Mitchell [16| and improved by Rao and Smith for the Steiner tree and TSP 
problems 1171 . Concerning approximation schemes, in addition to the work of Arora and Mitchell, others have built 
on similar ideas (e.g. l4l [T5lD . 

The Steiner forest problem, a generalization of the Steiner tree problem, is NP-hard lfT3ll and max-SNP complete |8 
[181 in general graphs and high-dimensional Euclidean space |[T9l . Therefore, no PTAS exists for these problems. The 
2-approximation algorithm due to Agrawal, Klein and Ravi (T) can be adapted to Euclidean problems by restricting 
the Steiner points to lie on a sufficiently fine grid and converting the problem into a graph problem. 

We have formulated the connectivity requirements in terms of pairs of terminals. One can equivalently formulate 
these in terms of sets of terminals: the goal is then to find a forest in which each set of terminals are connected. Arora 
states [ 3 ] that his approach yields an approximation scheme whose running time is exponential in the number of sets 

"This version is more recent than that appearing in the FOCS proceedings. The partition step has been corrected and the overall presentation has 
been clarified and formalized. This material is based upon work supported by the National Science Foundation under Grant Nos. CCF-0635089, 
CCF-0964037, and CCF-0963921. 

tWork done while visiting MIT's CSAIL. 



of terminals, and this is the only previous work to take advantage of the Euclidean plane to get a better approximation 
ratio than that of Agrawal et al. Q]. 

1.2 Recursive dissection 

In Arora's paradigm, the feasible space is recursively decomposed by dissection squares using a randomized variant of 
the quadtree (Figure [TJ. The dissection is a4-ary tree whose root is a square box enclosing the input terminals, whose 
width L is twice the width of the smallest square box enclosing the terminals, and whose lower left-hand corner of 
the root box is translated from the lower left-hand corner of the bounding box by (—a, —6), where a and b are chosen 
uniformly at random from the range [0, L/2). Each node in the tree corresponds to a dissection square. Each square 
is dissected into four parts of equal area by one vertical and one horizontal dissection line each spanning the breadth 
of the root box. This process continues until each square contains at most one terminal (or multiple terminals having 
the same coordinates). 

.^•depth 2 dissection square 



depth 1 dissection line 
..^•depth 2 dissection line 



Figure 1: The shifted quad-tree dissection. The shaded box is the bounding box of the terminals. 

Feasible solutions are restricted to using a small number of portals, designated points on each dissection line. A 
Structure Theorem states that there is a near-optimal solution that obeys these restrictions. The final solution is found 
by a dynamic program guided by the recursive decomposition. 

In the problems considered by Arora, the solutions are connected. However, the solution to a Steiner forest problem 
is in general disconnected, since only paired terminals are required to be connected. It is not known a priori how the 
connected components partition the terminal pairs. For that reason, maintaining feasibility in the dynamic program 
requires a table that is is exponential in the number of terminal pairs. In fact, Arora states [3 1 that his approach yields 
an approximation scheme whose running time is exponential in the number of sets of terminals. 

Nevertheless, here we use Arora's approach to get an approximation scheme whose running time is polynomial 
in the number of sets of terminals. The main technical challenge is in maintaining feasibility in a small dynamic 
programming table. 

1.3 Small dynamic programming table 

We will use Arora's approach of a random recursive dissection. Arora shows (ie. for Steiner tree) that the optimal 
solution can be perturbed (while increasing the length only slightly) so that, for each box of the recursive dissection, 
the solution within the box interacts weakly and in a controlled way with the solution outside the box. In particular, the 
perturbed solution crosses the boundary of the box only a constant number of times, and only at an 0(l)-sized subset 
of O(logn) selected points, called portals. The optimal solution that has this property can be found using dynamic 
programming. 

Unfortunately, for Steiner forest those restrictions are not sufficient: maintaining feasibility constraints cannot be 
done with a polynomially-sized dynamic program. To see why, suppose the solution uses only 2 portals between 
adjacent dissection squares Re and R^y. In order to combine the solutions in R\y and Re in the dynamic program 
into a feasible solution in Rw U Re, we need to know, for each pair (t, if) of terminals with t £ Rw an d if £ Re, 
which portal connects t and t' (Figure a)). This requires 2" configurations in the dynamic programming table. 

To circumvent the problem in this example, the idea is to decompose R\y and Re into a constant number of smaller 
dissection squares called cells. All terminals in a common cell that go to the boundary use a common portal. Thus, 
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Figure 2: Maintaining feasibility is not trivially polynomial-sized. 



instead of keeping track of each terminal's choice of portal individually, the dynamic program can simply memoize 
each cell's choice of portal. The dynamic program also uses a specification of how portals must be connected outside 
the dissection squares. This information is sufficient to check feasibility when combining solutions of the subproblems 
for R\y and for Re- To show near-optimality, we show that a constant number of cells per square is sufficient for 
finding a nearly-optimal solution. 

Basic notation and definitions For two dissection squares A and B, if A encloses B, we say that B is a descendent 
of A and A is an ancestor of B. If no other dissection square is enclosed by A and encloses B, we say that A is the 
parent of B and B is the child of A. We will extend these definitions to describe relationships between cells. The 
depth of a square S is given by its depth in the dissection tree (0 for the root). The depth of a dissection line is the 
minimum depth of squares it separates. Note that a square at depth i is bounded by two perpendicular depth-i lines 
and two lines of depth less than i. 

For a line segment s (open or closed), we use length(s) to denote the £2 distance between s's endpoints. For a set 
of line segments S = {sx, S2, ■ ■ ■}, length(S') = J^i l en gth( s i)- For a subset X of the Euclidean plane, a component 
of X is a maximal subset Y of X such that every pair of points in Y are path-wise connected in X. We use \X\ to 
denote the number of components of X. The diameter of a connected subset C of the Euclidean plane, diam(C), is 
the maximum £2 distance between any pair of points in C. We use OPT to denote both the line segments forming an 
optimal solution and the length of those line segments. 



2 The algorithm 



The algorithm starts by finding a rough partition of the terminals which is a coarsening of the connectivity requirements 
(subsection 2.1 1. We solve each part of this partition independently. We next discretize the problem by moving the 



terminals to integer coordinates of a sufficiently fine grid (subsection 2.2 1. We will also require that the Steiner points 
be integer coordinates. We next perform a recursive dissection (subsection 2.3 1 and assign points on the dissection 



lines as portals (subsection 2.4 1 as introduced in Section 1 .2 We then break each dissection square into a small number 



of cells. We find the best feasible solution F to the discretized problem that only crosses between dissection squares 
at portals and such that for each cell C of dissection square R, F D R has only one component that connects C to the 



boundary of R (subsection 2.5 1 



We will show that the expected length of F is at most a ^6 fraction longer than OPT. By Markov's inequality, 
with probability at least one-half the length^) < (1 + Ae)OPT. We show that by moving the terminals back to 



their original positions (from their nearest integer coordinates) increases the length by at most ^OPT. Therefore, the 
output solution has length at most (1 + e)OPT with probability one half. 
We now describe each of these steps in detail. 

2.1 Partition 

We first partition the set of terminal pairs, creating subproblems that can be solved independently of each other without 
loss of optimality. The purpose of this partition is to bound the size of the bounding box for each problem in terms 
of OPT. This bound is required for the next step, the result of which allows us to treat this geometric problem as a 
combinatorial problem. This discretization was also key to Arora's scheme, but the bound on the size of the bounding 
box for the problems he considers is trivially achieved. This is not the case for the Steiner forest problem. The size of 
the bounding box of all the terminals in an instance may be unrelated to the length of OPT. 

Let Q be the set of m pairs {(ti,t' i )}^L 1 of n terminals. Consider the Euclidean graph whose vertices are the 
terminals and whose edges are the line segments connecting terminal pairs in Q and let C\ , C 2 , ■ ■ ■ be the components 
of this graph. Let dist(Q) = max, diam(Ci); this is the maximum distance between any pair of terminals that must be 
connected. 

Theorem 2.1. There exists a partition of Q into independent instances Qi, Q2, • • • such that the optimal solution for 
Q is the disjoint union of optimal solutions for each Qi and such that the diameter of Qi is at most n^dist(Qi) where 
rii is the number of terminals in Qi. Further, this partition can be found in 0(n log n) time. 

We will show that the following algorithm, Partition(Q), produces such a partition. Let T be the minimum spanning 
tree of the terminals in Q . 

Partition^, T) 

Let e be the longest edge of T. 
If length(e) > ndist(Q), 

remove e from T and let T\ and T 2 be the resulting components. 

For i = 1, 2, let Qi be the subset of terminal pairs connected by Tj. 

T := Partition(Qi,Ti) U Partition(Q 2 , T 2 ). 
Return the partition defined by the components of T. 



Proof of Theorem 2.1 First observe that by the cut property of minimum spanning trees, the distance between every 
terminal in T\ and every terminal in T 2 is at least as long as the edge that is removed. 

Since a feasible solution is given by the union of minimum spanning trees of the sets of the requirement partition, 
and each edge in these trees has length at most dist(Q), OPT < ndist(Q). OPT cannot afford to connect a terminal 
of Ti to a terminal of T 2 , because the distance between any terminal in T\ and any terminal in T 2 is at least ndist(Q) 
which is greater than the lower bound. (By definition of dist, there cannot have been a requirement to connect a terminal 
of T\ to a terminal of T 2 .) Therefore, OPT must be the union of two solutions, one for the terminals contained by T\ 
and one for the terminals contained by T 2 . Inductively, the optimal solution for Q is the union of optimal solutions for 
each set in Partition(Q), giving the first part of the theorem. 

The stopping condition of PARTITION guarantees that there is a spanning tree of the terminals in the current subset 
Qi of terminals whose edges each have length at most m dist(Qi). Therefore, there is a path between each pair of 
terminals of length at most nf dist(Qi), giving the second part of the theorem. 

Finally, we show that PARTITION can be implemented to run in 0(n log n) time. The diameter of a set of points 
in the Euclidean plane can be computed by first finding a convex hull and this can be done in O(nlogn) by, for 
example, Graham's algorithm |12|. Therefore, dist(Ci) can be computed in O(nlogn) time. The terminal-pair sets 
Qi and Q 2 for the subproblems need not be computed explicitly as the required information is given by T\ and T%. 
By representing T with a top-tree data structure, we can find nt and d(Qi) by way of a cut operation and a sum 
and maximum query, respectively, in O(logn) time ifTTIl . Since there are 0{n) recursive calls, the total time for the 
top-tree operations is O (n log n) . □ 



Our PTAS finds an approximately optimal solution to each subproblem Qi (as defined by Theorem 2.1 1 and com- 
bines the solutions. For the remainder of our description of the algorithm, we focus on how the algorithm addresses 



one such subproblem Qi. In order to avoid carrying over subscripts and arguments Qi, dist(Qj), n,; throughout the 
paper, from now on we will consider an instance given by Q, dist(Q), and n, and assume it has the property that the 
maximum distance between terminals, whether belonging to a requirement pair or not, is at most n 2 dist(Q). OPT will 
refer to the length of the optimal solution for this subproblem. 



2.2 Discretize 

We would like to treat the terminals as discrete combinatorial objects. In order to do so, we assume that the coordinates 
of the terminals lie on an integer grid. We can do so by scaling the instance, but this may result in coordinates of 
unreasonable size. Instead, we scale by a smaller factor and round the positions of the terminals to their nearest 
half-integer coordinates. 



Scale 

We scale by a factor of 

40V2n 
edist(Q) ' 

Before scaling, OPT > dist(Q), the distance between the furthest pair of terminals that must be connected. After 
scaling we get the following lower bound: 

40%/2n 

OPT > — (1) 

e 



Before scaling, diam(Q) < ?i 2 dist(Q) by Theorem 2.1 After scaling we get the following upper bound on the 
diameter of the terminals: 

diam(Q) < ^! (2) 
e 

Herein, OPT refers to distances in the scaled version. 



Round 

We round the position of each terminal to the nearest grid center. Additionally, we will search for a solution that only 
uses Steiner points that are grid centers. We call this constrained problem the rounded problem. The rounded problem 
may merge terminals (and thus, their requirements). 

Lemma 2.2. A solution to the Steiner forest can be derived from an optimal solution to the rounded problem at 
additional cost at most -^OPT. 

Proof. Let F be an optimal solution to the rounded problem. From this we build a solution to the original problem 
by connecting the original terminals to their rounded counterparts with line segments of length at most 1 / v2, ie. half 
the length of the diagonal of a unit square. There are n terminals, so the additional length is at most n/y/2 which is at 
most j^OPT by Equation ((T). □ 

Let F be an optimal solution to the rounded problem. We relate the number of intersections of F with grid lines 
to length(F). We will bound the cost of our restrictions to portals and cells with this relationship. 

Lemma 2.3. There is a solution to the rounded problem of length (1 + j^e)OPT that satisfies 

\F n £\ < WPT. (3) 

grid lines i 

Proof. We build a solution F to the rounded problem from OPT by replacing each line segment e of OPT with a 
line segment e' that connects the half-integer coordinates that are nearest e's endpoints (breaking ties arbitrarily but 



consistently). Since the additional length needed for this transformation is at most twice (for each endpoint of e) the 
distance from a point to the nearest half-integer coordinate: 

length(e') < length(e) + V2 

Since OPT has at most n leafs, OPT has fewer than n Steiner points and so has fewer than An edges. The additional 
length is therefore no greater than 4v2n. Combining with Equation ([TJ, this is at most ^eOPT. 

F is composed of line segments whose endpoints are half-integer coordinates. Such a segment S of length s can 
cross at most s horizontal grid lines and at most s vertical grid lines. Therefore 

^ \sne\<2s 



grid lines 



and summing over all segments of F gives 



\F n £\ < 2 length(F) < 2(1 + ^e)OPT < 30PT 

grid lines £ 

where the last inequality follows from e < 1. □ 



From here on out, our goal is to find the solution that is guaranteed by Lemma 2.3 We will not be able to find this 



solution optimally, but will be able to find a solution within our error bound of e OPT. 

2.3 Dissect 

The recursive dissection starts with an L x L box that encloses the terminals and where L is at least twice as big as 
needed. This allows some choice in where to center the enclosing box. We make this choice randomly. This random 



choice is used in bounding the incurred cost, in expectation, of structural assumptions (Section 4.3 i that help to reduce 
the size of the dynamic programming table. 

Formally, let L be the smallest power of 2 greater than 2 • diameter(Q). In combination with Equation (|2), we get 
the following upper bound on L: 

L < —n 3 (4) 

e 

The ^-coordinate (and likewise the y-coordinate) of the lower left corner of the enclosing box are chosen uniformly at 
random from the L/2 integer coordinates that still result in an enclosing box. We will refer to this as the random shift. 



As described in section 1 .2 we perform a recursive dissection of this enclosing box. This can be done in 0(n log n) 
time |7 1. By our choice of L and the random shift, this dissection only uses the grid lines. Since the recursive dissection 
stops with unit dissection squares, the quad-tree has depth log L. 

Consider a vertical grid line I. Since there are L/2 values of the horizontal shift, and 2 l_1 of these values will 
result in £ being a depth-i dissection line, we get 

Prob[depth(^) = i] = 2*/L (5) 

2.4 Designate portals 

We designate a subset of the points on each dissection line as portals. We will restrict our search for feasible solutions 
that cross dissection lines at portals only. We use the portal constant A, where 

A is the smallest power of two greater than 30e _1 log L. (6) 

Formally, for each vertical (resp. horizontal) dissection line I, we designate as portals of £ the points on £ with 
y-coordinates (resp. x-coordinates) which are integral multiples of 

L 



j 4 2 depth(f) ■ 

There are no portals on the sides of root dissection square, the bounding box. Since a square at depth i has sidelength 
L/2 1 and is bounded by 4 dissection lines at depth at most i, we get: 



Lemma 2.4. A dissection square has at most AA portals on its boundary. 

Consider perpendicular dissection lines £ and £'. A portal p of £ may happen to be a point of £' (namely, the 
intersection point), but p may not be a portal of £', that is, it may not be one of the points of £' that were designated 
according to the above definition. 

The following lemma will be useful in Subsection |4.2| for technical reasons. 

Lemma 2.5. For every dissection square R, the corners of R are portals (except for the points that are corners of the 
bounding box). 

Proof. Consider a square R at depth i. Consider the two dissection lines that divide R into 4 £ and £'. The depth of 
these lines is i + 1. These lines restricted to R, namely £r = £ n R and £l R = £' n R, have length L/2\ a power of 2. 
Portals are designated as integral multiples of Lj (2 l+1 A), also a power of 2 and a 1/2A fraction of the length of £r 
and I'd, It follows that the endpoints and intersection point of £r and £' R are portals of these lines. □ 

2.5 Solve via dynamic programming 

In order to overcome the computational difficulty associated with maintaining feasibility (as illustrated in Figure |2j, 
we divide each dissection square R into a regular B x B grid of cells; B, which will be defined later, is 0(1 /e) and 
is a power of 2. Each cell of the grid is either coincident with a dissection square or is smaller than the leaf dissection 
squares. Consider parent and child dissection squares Rp and Re', a cell C of R p encloses four cells of Re- 

The dynamic programming table for a dissection square R will be indexed by two subpartitions (partition of a 
subset) of the portals and cells of R; one subpartition will encode the connectivity achieved by a solution within R and 
the other will encode the connectivity required by the solution outside R in order to achieve feasibility. The details are 
given in the next section. 

3 The Dynamic Program 

3.1 The dynamic programming algorithm 

The dynamic program will only encode subsolutions that have low complexity and permit feasibility. We call such 
subsolutions conforming. We build a dynamic programming table for each dissection square. The table is indexed by 
valid configurations and the entry will be the best compatible conforming subsolution. 

Low complexity and feasible: conforming subsolutions 

Let R be a dissection square or a cell, and let F be a finite number of line segments of R. We say that F conforms to 
R if it satisfies the following properties: 

• (boundary property) \F n dR\ < i(D + 1). 

• (portal property) Every connected component of F H dR contains a portal of R. 

• (cell property) Each cell C of R intersects at most one connected component of F that also intersects dR. 

• (terminal property) If a terminal t € R is not connected to its mate by F then it is connected to dR by F. 

The constant D is defined in Equation (j7|i and is 0(1/ e). Note that the first three properties are those that bound the 
complexity of the allowed solutions and the last guarantees feasibility. We say that a solution F recursively conforms 
to R if it conforms to all descendents dissection squares of R (including R). We say that a solution F is conforming if 
it recursively conforms to the root dissection square with every terminal connected to its mate. It is a trivial corollary 
of the last property that a conforming solution is a feasible solution to the Steiner forest problem. We will restate and 
prove the following in Section [4j the remainder of this section will give a dynamic program that finds a conforming 
solution. 

Theorem 3.1 (Structure Theorem). There is a conforming solution that has, in expectation over the random shift of 
the bounding box, length at most (1 + §)OPT. 



Indices of the dynamic programming table: valid configurations 



The dynamic programming table DPr for a dissection square R will be indexed by subpartitions of the portals and 
cells of R that we call configurations. A configuration of R is a pair (7r m , 7r out ) with the following properties: ir m is a 
subpartition of the cells and portals of R such that each part contains at least one portal and at least one cell; 7r° ut is 
a coarsening of ir m . See Figure [3] tt 1 " will characterize the behaviour of the solution inside R while 7r° ut will encode 
what connections remain to be made in order to make the solution feasible. For a terminal t E R, we use Cn[t] to 
denote the cell of R that contains t. We say a configuration is valid if it has the following properties: 

• (compact) 7r ln has at most 4(D + 1) parts and contains at most A{D + 1) portals. 

• (connecting) For every terminal t in R whose mate is not in R, Cn[t] is in a part of 7r ln . For every pair of mated 
terminals t, t' in R, either Cu[t] and Cn[t'] are in the same part of 7r out or neither Cu[t] nor Cn[t'} are in 7r ln . 

The connecting prop erty will allow us to encode and guarantee feasible solutions. Since a dissection square has 4A 
portals (Lemma 2.4 1 and B 2 cells, the first property bounds the number of configurations: 



Lemma 3.2. There are at most (4^4 + _B 2 )°( £> ) or (e 2 log n) oty1 ^^ compact configurations of a dissection square. 

We will use the following notation to work with configurations: For a subpartition tt of S and an element x 6 S, 
we use ir[x] to denote the part of tt containing x if there is one, and otherwise. For two subpartitions tt and tt' of a 
set S, we use tt V tt' to denote the finest possible coarsening of the union of tt and tt' . If we eliminate the elements 
that are in partition tt' but not in partition tt, tt V tt' is a coarsening of tt' and vice versa. 



Figure 3: A dissection square and cells (grid), terminal pairs (triangles and pentagons) and unmated terminal (square), 
and subsolution (dark lines). The grey components give the parts of tt 1 " with portals (half-disks). To be a valid 
configuration, the two parts containing the pentagon terminals must be in the same part of 7r out . The subsolution 
conforms to R and is compatible with (7r ln , 7r out ). 



Entries of the dynamic programming table: compatible subsolutions 

The entries of the dynamic programming table for dissection square R are compatible subsolutions, subsolutions that 
satisfy. Formally, a subsolution F and configuration (7r ln , 7r out ) of R are compatible if and only if 7r m has one part 
for every connected component of F that intersects dR and that part consists of the cells and portals of R intersected 
by that connected component (Figure [3}. Note that as a result, some valid configurations will not have a compatible 
subsolution: if a part of tt 1 " contains disconnected cells with terminals inside, then no set of line segments can connect 
these terminals and be contained by the cells of that part. The entries corresponding to such configurations will indicate 
this with 00. 



Observation 3.3. If F conforms to R then (tt'", tt""') is a valid configuration. 

As is customary, our dynamic program finds the value of the solution; it is straightforward to augment the program 
so that the solution itself can be obtained. Our procedure for filling the dynamic programming tables, POPULATE, will 
satisfy the following theorem: 

Theorem 3.4. POPULATE(i?) returns a table DP R such that, for each valid configuration (tt'", tt ou ') ofR, DP R [-K m , tt 01 "] 
is the minimum length of subsolution that recursively conforms to R and that is compatible with (tt'", tt" 1 "). 

We prove this theorem in Section [33] 



Consistent configurations 

A key step of the dynamic program is to correctly match up the subsolutions of the child dissection squares R\, . . . , R4 
of Rq. Consider valid configurations (irf, 7r° ut ) for i = 0, . . . , 4 and let ttq — V*=i ^T- We say that the configurations 
(71-™, 7r° ut ) for i = 0, . . . , 4 are consistent if they satisfy the following connectivity requirements: 

1 . (internal) tv™ is given by ttq with portals of Ri that are not portals of Rq removed, parts that do not contain 
portals of Rq removed, and each cell of Ri replaced by the corresponding (parent) cell of Rq. (If non-disjoint 
parts result from replacing cells by their parents, then the result is not a partition and cannot be ttq.) 

2. (external) For two elements (cells and/or portals) x, x' of Ri, 7r° ut [a;] = 7r° ut [a/] if and only if -Kq [x] = ttq [x'\ or 
there are portals p, p' such that 7rg ut [p] = 7Tq iU [p'\ , ttq [x] = ttq [p] , and tt^ [x'\ = tt^ [p'\ . 

3. (terminal) For mated terminals t E Ri and if G Rj with 1 < i < j < 4, either ttq [Ci[t]} = ttq [Cj[t']\ or 



Dynamic programming procedure 

We now give the procedure POPULATE that fills the dynamic programming tables. The top dissection square R has a 
single entry, the entry corresponding to the configuration (0, 0). The desired solution is therefore given by DP/?[0, 0] 
after filling the table DP^ with POPULATE(i?). The corresponding solution is conforming. The following procedure 
is used to populate the entries of DP/? . The procedure is well defined when the tables are filled for dissection squares 
in bottom-up order. 

POPULATE(i?o) 



If Rq contains at most one terminal, then % Rq is a leaf dissection square 

For every valid configuration (7r ln , 7r out ) of Rq, 
DP fl >V° ut ] :=0 
For every part P of 7r ln , 

if the cells of P are connected and contain the portals (and terminal) of P, 
Fp := minimum-length set of lines in the cells of P that 
connects the portals in P (and terminal, if in P), 
DP Ro [tt", 7r out ] := DP Ro [Tr in , n° M ] + length(F P ); 
otherwise, DP^ [n m , tt oM ] := 00. % no subsolution conforms to tt"\ tt°" 

Otherwise, % Rq is a non-leaf dissection square 

let i?2, i?4, Ra denote the children of Rq. 

For every valid configuration (ttq, ttq u1 ) of Rq, initialize DP Ro (tt™, 7t™') := 00. 
For every quintuple of indices {(7r- n , 7r| ut )}. to {DP fij }^ =0 , 
if { (tt j n , 7r° ut )}^ =0 are consistent, 

DP^M*] := minjDP^K^n.EtiDPflJ^^r]}- 



3.2 Running time 



Since each part of 7r ln contains O(D) portals (since 7r m is compact), Fp is a Steiner tree of 0(D) terminals (portals 
and possibly one terminal) among the cells of 7r m . To avoid the cells that are not in 7r m , we will require at 0(B 2 ) 
Steiner points. Fp can be computed in time proportional to B and D (which are 0(l/e)) by enumeration. Since the 
number of compact configurations is poly logarithmic and since there are 0(n log n) dissection squares, the running 
time of the dynamic program is therefore 0(n log^ n), where £ is a constant depending on e. 



3.3 Correctness (proof of Theorem 3.4 1 



We prove Theorem |3.4| giving the correctness of our dynamic program, by bottom-up induction. In the following, we 
use the notation, definitions and conditions of POPULATE. The base cases of the induction correspond to dissection 
squares that contain at most one terminal. If any part P of ir m contains cells or portals that are disconnected, then 
there is no subsolution that is compatible with n m and DP^ [it™, 71"™'] = oo represents this. Otherwise the subsolution 
Fq that is given by the union of {Fp : part P of 7r in } is compatible with 7r ln by construction. Further Fq satisfies the 
terminal property of conformance with Rq by construction and the remaining properties since it is compatible with a 
valid conformation. 

When Rq contains more than one terminal, for a valid configuration (7r ln , 7r out ) of R , we must prove: 

Soundness If DP^ [7r n , ttq M ] is finite then there is a subsolution Fq that recursively conforms to R , is compatible 
with (7rj) n , 7rg ut ) and whose length is DP flo [n n , vr° ut ]. 

Completeness Any minimal subsolution F that recursively conforms to R and is compatible with (71"™, 7TQ Ut ) has 
length at least DP Ra [n^ , tt™'] . 

The proof of Theorem |3.4| follows directly from this. We will use the following lemma: 

Lemma 3.5. Let {('k'" , Tr° ut )}f_ be consistent configurations for dissection square Rq and child dissection squares 
R\, . . . , i?4. For i = 1, . . . ,4, let F± , . . . , F4 be subsolutions that recursively conform to Ri and are compatible with 
(•7T™, 7T°" f ). Then \jf =1 Fi recursively conforms to Ro and is compatible with (tt'q", 7Tq"). 

Proof. Recall that Fq is compatible with (ir™, 7Tg ut ) if 7T™ has one part for every connected component of Fq that 
intersects 8Rq and that part consists of the cells and portals intersected by that component. Consider a component K 
of F that intersects 8Rq. There must be a child dissection square Ri with a part of tt 1 " that consists of the cells and 
portals intersected by K n Ri- Consider all such parts Pj,j = 1, . . .. (Note that there may be more than one such part 
from a given child dissection square.) These parts belong to a part P of ttq . 

We argue that no other child configuration parts make up P. For a contradiction, suppose another part P' is in 
the make up of P. Since (ir n , 7Tq u ') is consistent with the child configurations, P' cannot share a cell with any of Pj, 
j = 1, ... for otherwise P would not survive the pruning given by the internal connectivity requirement of consistency. 
Therefore, P' must share a portal with some Pf, the corresponding parts K' and Kj would therefore also share this 
portal, implying that K fl K' is connected, a contradiction. 

Again, by the internal connectivity requirement of consistency, P is obtained from Pj,j = 1, . . . by: 

• Removing the portals that are not in Rq. The remaining portals are on 8Rq, and K connects them since Kj, 
.7 = 1,... connect them by the inductive hypothesis. 

• Each cell C of Pj is replaced by the parent cell, which entirely contains C D K. 

Finally P is not removed altogether since K intersects 8Rq and this intersection must contain a portal of Rq. There- 
fore, there is a part of obtained from P that contains all the cells and portals intersected by K. □ 



Proof of soundness 

If DPfl [ttq, 7Tg ut ] is finite, then there must be entries DP Rt [ttq, 7Tg ut ] that are finite for i = 1, . . . , 4 and such that 
DPp [7r n , 7Tg ut ] = J2t=i DPfli ["oS 7TQ Ut ] . Then, by the inductive hypothesis, for i — 1, . . . ,4, there is a subsolution 



F{ that recursively conforms to Ri, has length DP^ i . [7rJ n , 7r° ut ], and is compatible with 7r™,7r° ut . We simply define 



Fq = ljj =1 Fi, by definition, Fq has the desired length. By Lemma 3.5 Fq is compatible with (71-™, ttq M ). We show 



that Fo conforms to Fo by illustrating the four properties of conformance. 



F satisfies the portal property Let if be a component of F fl ORq. For some child Ri, the intersection of K with 
dRi n dRo is nonempty. Since Fi satisfies the portal property, K n SF,; n dRo must also contain a portal; that portal 
is also a portal of R . 



Fo satisfies the cell property Let C be a cell of Fo that is enclosed by child dissection square Ri. Suppose for 
a contradiction that two connected components K\ and K2 intersect both C and dR. Then K\ n F.; and F2 H F.; 
must be connected components of Fi that intersect cells C\ and C2, respectively, and dRi, where C\ and C2 are child 
dissection squares of C. Since F^ satisfies the cell property w.r.t. Ri, C± ^ C2 and these cells belong to parts Pi 7^ P2 
of 7r"\ By the internal connectivity quire ment of consistency, these cells would both get replaced by C, implying that 
7r n has two parts containing the same cell, a contradiction. 

F satisfies the terminal property Consider a terminal t in Ri and R such that [t] is in a part P of 7rf (for 
otherwise, the terminal property follows from the inductive hypothesis). If i's mate is not in Rq, then, by the connecting 
property of valid configurations, Cr [t] is in a part of 71-" and the terminal property follows from compatibility. So 
suppose i's mate, t' is in Rq (and child Rj). 

Since the configurations are valid, tf is in a part P' of 7r™. If 7Tg ut [(7R [t]] — 7TQ Ut [C/^ [if]], the terminal property 
follows from compatibility. If not, then by the terminal connecting property of configuration consistency, either 
""opRiM] = 7r o [C/i ■ [*']]■ Since parts of child configurations cannot share cells, there must be a series of parts 
Pi, . . . ,Pk where Pi contains Cr ( [t], F& contains Cr, [if] and parts Pi and Pg + 1 contain a common portal pg for 
I = 1, . . . , k — 1. Since F\, . . . , F4 are compatible with ivf, . . . , ir™, respectively, by the inductive hypothesis, there 
is a component Kg in uf =1 Fi that connects t and pi (for I — 1), pg and pg+i (for ^ = 2, . . . , k — 1) and to t' (for 
£ = k). U^L-l-JQ is a component in Fq that connects t and If, giving the terminal property. 

F satisfies the boundary property Since (n n , 7rg ut ) is a valid configuration, ttq has at most A(D + 1) parts. By 
compatibility, Fo has at most A(D + 1) components intersecting 8Rq. This proves the compactness property of 
conformance. 



Proof of completeness 

Let Fo be any minimal subsolution that recursively conforms to Fo and is compatible with (7r n , Tr™'). We show that 
Fo has length at least DPr„ [w™, 7To ut ], proving completeness. For i = 1, . . . , 4, let Fj = Fo H Ri- Since Fo 
recursively conforms to Fo, F recursively conforms to Ri. For i = 1, . . . , 4, let (vr™, 7r° ut ) be a configuration of 



Fi that is compatible with Fj. By Observation 3.3 (7rJ n ,7r° ut ) is a valid configuration. By the inductive hypothesis 



length(F) > DP/jJ(7rJ n ,7r° ut )]. It follows that length(F ) > J2i=i DP R. K^". ^Dl- If the child configurations 



{(7rf, 7r° ut )}f =1 are consistent with (7r n , tTq 11 '), S»=i ^P^i [(^"j ^i"')] w iH he an argument to the minimization in 
POPULATE and therefore length(F ) > DP flo [7r n , 7r™ 1 ]. It is therefore s uffici ent to show that the child configurations 



{(7r™, 7r° ut )}^ =1 are consistent with (ir n , irQ Ut ). Equivalently, by Lemma 3.5 Fo is compatible with the configuration 
(tt'q, 7TQ Ut ) that is consistent with {(tt 1 ", 7r° ut )}i = i according to the connectivity requirements of consistency. 

This completes the proof of Theorem|3.4| 



4 Proof of the Structure Theorem (Theorem 3.1 ) 



In this section we give a proof of the Structure Theorem (Theorem 3.1 1. We restate and reword the theorem here for 
convenience. It is easy to see that the statement here is equivalent to the statement given in Section[3] only the terminal 
property of conformance is missing, but that is encoded by feasibility. 



Theorem |3.1| (Structure Theorem). There is a feasible solution F to the rounded Steiner forest problem having, in 
expectation over the random shift of the bounding box, length at most ^eOPT more than OPT such that each dissection 
square R satisfies the following three properties: 

Boundary Property For each side S, F n S has at most D non-corner components, where 

D = QQe- 1 (7) 



Portal Property Each component of F D dR contains a portal. 

Cell Property For each cell C of R, F has at most one component that intersects both dC and dR. 

First, in a way similar to Arora, we illustrate the existence of a nearly-optimal solution that crosses the boundary of 
each dissection square a small number of times (Boundary Property) and does so at portals (Portal Property). To that 



end, starting with the solution Fq as guaranteed by Lemma 2.3 we augment Fq to create a solution F\ that satisfies 
the Boundary Components Property, then augment F\ to a solution F 2 that also satisfies the Portal Property. The Cell 
Property is then achieved by carefully adding to F 2 boundaries of cells that violate the Cell Property. 



By Lemma 2.3 Fq is longer than OPT by ^jOPT. We show that we incur an additional ^OPT in length in 

tV 



satisfying each of these three properties, for a total increase in length of y^eOPT, giving the Theorem 



4.1 The Boundary Property 

We establish the Boundary Property constructively by starting with Fx = F and adding closures of the intersection 
of F± with the sides of dissection squares. For a subset X of a line, let closure(AT) denote the minimum connected 
subset of the line that spans X. For a side S of a dissection square R, a connected component of a subset of S is a 
non-corner component if it does not include a corner of R The construction is a simple greedy bottom-up procedure: 

SatisfyBoundary: 

For each j decreasing from log L to 0, 

For each dissection line £ such that depth(£) < j, 
for each j-square with a side S C t, 

if | {non-corner components of F\ H S} | > D, 

add closure (non-corner components of Fi fl S) to Fi. 



SatisfyBoundary establishes the Boundary Property 

Consider a dissection square R, a side S of R, and the dissection line I containing S. The iteration involving £ and 
j = depth (I) ensures that, at the end of that iteration, there are at most D components of Fi PI S not including the 
endpoints of S, which are corners of R. We need to show that later iterations do not change this property. 

Consider an iteration corresponding to j' < j, a line £' with j' > depth(^'), and a side S' C £' of a j'-square R' . 
By the nesting property and since S 1 cannot be enclosed by S, S D £' is either empty, a corner of R or equal to S. In 
the first case, S D Fi is not affected by adding a segment of S' . In the second case, no new non-corner component of 
Fi PI S appears. In the third case, if adding a segment of S' would reduce \S PI -Fi| to one. See Figure|4] 



The increase in length due to SatisfyBoundary is small 

For iteration j of the outer loop and iteration I such that j > depth(^) of the second loop, let random variable Cgj 
denote the number of executions of the last step: 

add closure (non-corner components of Fi n S) to F\ 
Note that, conditioning on depth(^) < j, C\ j is independent of depth(£) (however Ci j does depend on the random 
shift in the direction perpendicular to £). Initially the number of non-corner components of Fi D £ is at most the 
number of components, \Fq n i\. As argued above: for every j > depth(£), every j-square either is disjoint from £ 
or has a side on £, so dealing with a line £' parallel to £ does not increase the number of components on £; For every 
j < depth(£), dealing with a line £' perpendicular to I can only introduce a corner component on £. So, the total 
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Figure 4: The second (right) and third (left) cases for showing that S ATISFYBOUNDARY can only decrease the number 
of components along the side of another dissection square or adding a corner component when a segement (thick line) 
of a dissection square side (R n I) is added to F (not shown). 



number of non-corner components on i never increases. Since it decreases by D at each of the Ct j closure operations, 
we have 

logL 

C £J <\F ni\/D. 

j=depth(£) 

Since length(S') = L/2 J , the total increase in length resulting from these executions is at most Ct t j(L/2P). Therefore, 
the expected increase in length along I is 

L 

>2J 



£(length(Fi n £) - length(F n £)) < ^ Prob[depth(f ) = i] ^ E[C ttj \ depth (I) = i\ 



i j>i 
% j>i 

= ^S[C, J |depthW<j]1^2 4 

3 i<j 

< 2E[ Y Qj|depth(£)] 

j>depth(f) 

< 2\F a n£\/D. 

Summing over all dissection lines I, and using the bounds on J^e l-^b H £\ and D as given by Equations ([3]) and ((7J, 
respectively, we infer that the length of Fi is at most ^OPT more than the length of Fq. 



4.2 The Portal Property 

We establish the Portal Property constructively by starting with F2 = F\ and extending F2 along the boundaries of 
dissection squares to nearest portals. We say a component is portal-free if it does not contain a portal. The following 
construction establishes the Portal Property: 

SatisfyPortal: 

For each j decreasing from log L to 0, 

For each dissection line I such that depth(£) = j, 
for each portal-free component K of F2 n £, 
extend K to the nearest non-corner portal on I. 



SatisfyPortal preserves the Boundary Property 



Focus on dissection line I. Before the iteration corresponding to £, possible extensions along lines t! that are perpen- 
dicular to I and of depth greater than of equal to depth(£) do not extend to £, because £' n £ is a corner of £'. After the 
iteration corresponding to £, for each possible extension along lines £' that are perpendicular to £ and of depth strictly 
less than depth(£), £' n £ is a corner of any dissection square R with a side along £ containing I n £', so the Boundary 
Property for £ is not violated. 



The increase in length due to SatisfyPortal is small 

Consider a dissection line I, When dealing with line £, SatisfyPortal only merges components and, in doing so, 
does not increase the number of components of F% n £. When dealing with a dissection line £' perpendicular to t, As 
SatisfyPortal might add the component £ n £' to F\ n £. However, similar to the argument used above, in that case 
£' n £ is a corner of any dissection square R with a side along £ containing £ n £'. Since, by Lemma 2.5 corners are 
portals, no extension is made for this component. Therefore, each component of Fi D £ that does not already contain 
a portal is an extension of what was originally already a component of Fq n £ and so, at most \Fq D £\ extensions are 
made along £. 

Each of these extensions adds a length of at most Lj (j42 depth( ^ ) (the inter-portal distance for line £). Therefore, 
the total length added along dissection line £ is bounded by |F H £\ L/(A2 depth(e '>). Since Prob[depth(£) = i] = 2 i /L, 
the expected increase in length due to dissection line £ is 



i=i 



Summing over all dissection lines and using Equations |3]l and |6]), we infer that the length of F-2 is at most ^OPT 
more than the length of F\ . 



4.3 The Cell Property 

We establish the Cell Property constructively by starting with F 3 = F2 and adding to F3 boundaries of cells that 
violate the Cell Property. Let C be a cell of a dissection square R. We say C is happy with respect to the solution F3 if 
there is at most one connected component of F 3 that touches both the interior of C and dR. We cheer up an unhappy 
cell C by adding to F3 a subset A of dC, as illustrated in Figure [5] 

A(C, F 3 ) = 8(C) \ {sides S of C : depth(S) < depth(C) and S n F 3 = 0}. (8) 

Recall that each cell C of R is either coincident with a dissection square that is a descendant of R or is smaller than 
and enclosed by a leaf dissection squares that is a descendant of R. Definitions for the depth of a cell and its sides are 
inherited from the definitions of dissection-square depths and dissection-line depths. 
Happiness of all cells, and therefore the Cell Property, is established by the following procedure: 

SatisfyCellAbstract: 

While there is an unhappy cell C, 

add A(C,F 3 ) to F 3 . 

Let C be the set of cells that we augment in the above procedure. 

We claim that there is a function h from the cells C to the components of Fq (the original forest that we started 
with prior to the SATISFY procedures) that is injective and, such that, for a cell C of dissection square R, /(C) is a 
component of F that intersects dR. 

To define h, consider the following abstract directed forest H whose vertices correspond to connected components 
of Fq and whose edges correspond to augmentations made by SatisfyCell (defined formally as follows). An 
augmentation for cell C is triggered by the existence of at least two connected components T, T' of the current F3 that 
both touch the interior of C and the boundary of its associated dissection square R. Since the SATISFY procedures 



(a) (b) (c) 

Figure 5: The three cases (up to symmetry) of augmenting C. The dotted lines are F3, C is the smaller square and C's 
parent is the larger square (to illustrate the relative depth of C's sides). In cases (a) and (b), the augmentation A is not 
all of dC so is open at the ends. In (a), F 3 intersects neither of the sides of C that have depth less than that of C, so 
the augmentation A consists only of the two sides having depth equal to that of C. In (b), one of the low-depth sides 
intersects F 3 , so it belongs to A. In (c), both low-depth sides intersect F 3 , so A is all of dC. 

augment the solution, T and T" each contain (at least one) connected component T and Tq of F - it is the vertices 
corresponding to T and Tq that are adjacent in H; we will show shortly that there exist such components that intersect 
dR. Arbitrarily root each tree of H and direct each of its edges away from the root. For augmentation of cell C, 
we then define h(C) as the component of F that corresponds to the head of the edge of H associated with the 
augmentation of C. Since each vertex of H has indegree at most 1, h is injective. 

We show, by way of contradiction, that there is a component of F contained by T that intersects dR. Consider 
all the components T of F that are contained by T and suppose none of these intersect dR. Let £ be a dissection line 
bounding R that T intersects. Since T does not intersect dR, T must have been created from T by augmentations 
(by way of SatisfyBoundary and SatisfyCell) one of which added a subset X of dissection line £' such that X 
intersects I. Since T does not intersect X and neither SatisfyBoundary nor SatisfyCell augment to the corner 
of a dissection line, I and £' must be perpendicular. Further X is a subset of a side S' of square R' and does not contain 
a corner of R' . In summary, R and R' are dissection squares bounded by perpendicular dissection lines £ and £' but 
for which I n £' is not a corner of R! or R, contradicting that dissection squares nest. 

We are now ready to give an implementation of SatisfyCellAbstract: 

SatisfyCell: 

For each dissection line £, 

for j decreasing from log L to depth {£), 
for each j-square R with side S C £, 

while there is an unhappy cell C such that h(C) intersects £ 
add A(C,F 3 ) to F 3 . 

Since h{C) intersects some side of some dissection square, this procedure makes each of the cells happy. 
The increase in length due to SatisfyCell is small 

Let the random variable Cej denote the number of augmentations corresponding to dissection line £ and index j. 
Thanks to the injective mapping h, we have: 

j 

Since a cell has boundary length shorter than its j-square by a factor of B, the total increase in length corresponding 
to these iterations is at most CVjlength(j-square)/.B. Summing over j, the total length added by SatisfyCell 
corresponding to dissection line £ is at most 

j>depth(£) 



Since the probability that grid line I is a dissection line of depth k is 2 k /L, the expected increase in length added by 
SatisfyCell corresponding to dissection line I is at most 



4.1 



9 fc AT 

y £ T y £E[C td \d B pfh{£) = k] m . 

k j>k 

we observe that Ct j conditioned on depth(£) < j is independent of depth(^). By the same swapping 



As in Section 
of sums as before, this is then bounded by 

(8/B)E[ ^ C{£,j)\depfh(£)]<^\F n£\ 

j>depth(f) 

Summing over all dissection lines, our bound on the expected additional length becomes 

-]T|^ru| = -(i + e)OFr 

t 

For B = 240/e, this is at most ^OPT by Equation ([!}■ 



SatisfyCell maintains the Boundary and Portal Properties 

We show that SatisfyCell maintains the Boundary and Portal Properties by showing that for any forest F satisfying 
the Boundary and Portal Properties, any single SatisfyCell augmentation of F also satisfies these properties. 

Let C be an unhappy cell and let R be a dissection square satisfying the Boundary and Portal Properties. Let A be 
the augmentation that is used to cheer up C. If A (1 dR contains a corner of R, then the Boundary Property is satisfied 
because A n dR would be a corner component and the Portal Property is satisfied because the corners of dissection 
squares are portals. 

So, suppose that A n dR is not empty but does not contain a corner of R. Refer to Figure |6]for relative positions 
of R and C. Then dC n dR cannot include an entire side of R, so it must be that depth(C) > depth(i?). Further, if 
A n dR does not include a corner of R, then A n dR must be a subset of a single dissection line, I. 




R 



Figure 6: Relative positions of R and C. 



If A n £ n F is not empty, then F n I is not empty. Since F satisfies the Portal Property, F <~) £ also includes a 
portal. Since the addition of A can only act to merge components, \£1Z fl (F U A)\ < \£1Z (1 A\ and so F still satisfies 
the Boundary Property. 

If A n dR n F is empty, then, by Equation (|8j, depth(^) > depth(C). But depth(C) > depth(i?), so depth(£) > 
depth(i?). This is impossible because £ is a line bounding R. 
This completes the proof of Theorem |3.1| 



4.4 Proof of Theorem O 

Recall Theorem 1 1.1 1 stating that there is a randomized 0{n polylog n)-time approximation scheme for the Steiner for- 
est problem in the Euclidean plane. The proof of this Theorem is a corollary of Theorems |3 . 1 1 |3~4] |2 . 1 1 and Lemma |2~2 
as follows. Theore m |3 . 4| guarantees that we can compute, using dynamic programming, a solution that satisfies The- 
Section 3.2 argues that this DP takes 0(n polylog n) time. Lemma 2.2 and Theorem 2.1 shows that we 



3.1 



can convert the solution(s), of near-optimal cost, guaranteed by Theorem 3.1 to near-optimal solutions for the original 
problem, thus giving Theorem |1.1| 



5 Conclusion 

We have given a randomized 0(npoly log n)-time approximation scheme for the Steiner forest problem in the Eu- 
clidean plane. Previous to this result polynomial-time approximation schemes (PTASes) have been given for subset- 
TSP [14] and Steiner tree J9j [X0| in planar graphs, using ideas inspired from their geometric counterparts. Since 
the conference version of this paper appeared, a PTAS has been given for Steiner forest in planar graphs by Bateni 
et al. [6|. Like our result here, Bateni et al. first partition the problem and then face the same issue of maintaining 
feasibility that we presented in Section [T3] except in graphs of bounded treewidth. They overcome this by giving a 
PTAS for Steiner forest in graphs of bounded treewidth; they also show this problem in NP-complete, even in graphs 
of treewidth 3. Recently we have seen this technique generalized to prize collecting versions of the problem for both 
Euclidean and planar |5 1 instances. 
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